Exploiting Multigrain Parallelism in Pairwise Sequence Search on Emergent CMP Architectures
نویسنده
چکیده
(ABSTRACT) With the emerging hybrid multi-core and many-core compute platforms delivering unprecedented high performance within a single chip, and making rapid strides toward the commodity processor market, they are widely expected to replace the multi-core processors in the existing High-Performance Computing (HPC) infrastructures, such as large scale clusters, grids and supercomputers. On the other hand in the realm of bioinformatics, the size of genomic databases is doubling every 12 months, and hence the need for novel approaches to parallelize sequence search algorithms has become increasingly important. This thesis puts a significant step forward in bridging the gap between software and hardware by presenting an efficient and scalable model to accelerate one of the popular sequence alignment algorithms by exploiting multigrain parallelism that is exposed by the emerging multiprocessor architec-tures. Specifically, we parallelize a dynamic programming algorithm called Smith-Waterman both within and across multiple Cell Broadband Engines and within an nVIDIA GeForce General Purpose Graphics Processing Unit (GPGPU). Cell Broadband Engine: We parallelize the Smith-Waterman algorithm within a Cell node by performing a blocked data decomposition of the dynamic programming matrix followed by pipelined execution of the blocks across the synergistic processing elements (SPEs) of the Cell. We also introduce novel optimization methods that completely utilize the vector processing power of the SPE. As a result, we achieve near-linear scalability or near-constant efficiency for up to 16 SPEs on the dual-Cell QS20 blades, and our design is highly scalable to more cores, if available. We further extend this design to accelerate the Smith-Waterman algorithm across nodes on both the IBM QS20 and the PlayStation3 Cell cluster platforms and achieve a maximum speedup of 44, when compared to the execution times on a single Cell node. We then introduce an analytical model to accurately estimate the execution times of parallel sequence alignments and wavefront algorithms in general on the Cell cluster platforms. Lastly, we contribute and evaluate TOSS – a Throughput-Oriented Sequence Scheduler, which leverages the performance prediction model and dynamically partitions the available processing elements to simultaneously align multiple sequences. This scheme succeeds in aligning more sequences per unit time with an improvement of 33.5% over the na¨ıve first-come, first-serve (FCFS) scheduler. nVIDIA GPGPU: We parallelize the Smith-Waterman algorithm on the GPGPU by optimizing the code in stages, which include optimal data layout strategies, coalesced memory accesses and blocked data decomposition techniques. Results show that our methods provide a maximum speedup …
منابع مشابه
Multigrain Parallel Processing on OSCAR Chip Multiprocessor
This paper describes multigrain parallel processing on OSCAR Chip Multiprocessor (OSCAR CMP). The aim of OSCAR CMP is to achieve both of scalable performance improvement with effective use of huge number of transistors on a chip and high efficiency of application development with compiler supports. OSCAR CMP integrates simple single issue processors having local data memory for private data rec...
متن کاملA multigrain Delaunay mesh generation method for multicore SMT-based architectures
Given the proliferation of layered, multicoreand SMT-based architectures, it is imperative to deploy and evaluate important, multi-level, scientific computing codes, such as meshing algorithms, on these systems. We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level, medium-grain at the cavity level and fine-grain at the elem...
متن کاملCoarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler
This paper describes automatic coarse grain parallel processing on a shared memory multiprocessor system using a newly developed OpenMP backend of OSCAR multigrain parallelizing compiler for from single chip multiprocessor to a high performance multiprocessor and a heterogeneous supercomputer cluster. OSCAR multigrain parallelizing compiler exploits coarse grain task parallelism and near ne gra...
متن کاملA Chip-Multiprocessor Architecture with Speculative Multithreading
ÐMuch emphasis is now placed on chip-multiprocessor (CMP) architectures for exploiting thread-level parallelism in an application. In such architectures, speculation may be employed to execute applications that cannot be parallelized statically. In this paper, we present an efficient CMP architecture for speculative execution of sequential binaries without source recompilation. We present the s...
متن کاملRuntime Support for Multigrain and Multiparadigm Parallelism
This paper presents a general methodology for implementing on clusters the runtime support for a two-level dependence-driven thread model, initially targeted to shared-memory multiprocessors. The general ideal is to exploit existing programming solutions for these architectures, like Software DSM (SWDSM) and Message Passing Interface. The management of the internal runtime system structures and...
متن کامل